multinomial mixture model
Towards the Identifiability in Noisy Label Learning: A Multinomial Mixture Approach
Nguyen, Cuong, Do, Thanh-Toan, Carneiro, Gustavo
Learning from noisy labels (LNL) plays a crucial role in deep learning. The most promising LNL methods rely on identifying clean-label samples from a dataset with noisy annotations. Such an identification is challenging because the conventional LNL problem, which assumes a single noisy label per instance, is non-identifiable, i.e., clean labels cannot be estimated theoretically without additional heuristics. In this paper, we aim to formally investigate this identifiability issue using multinomial mixture models to determine the constraints that make the problem identifiable. Specifically, we discover that the LNL problem becomes identifiable if there are at least $2C - 1$ noisy labels per instance, where $C$ is the number of classes. To meet this requirement without relying on additional $2C - 2$ manual annotations per instance, we propose a method that automatically generates additional noisy labels by estimating the noisy label distribution based on nearest neighbours. These additional noisy labels enable us to apply the Expectation-Maximisation algorithm to estimate the posterior probabilities of clean labels, which are then used to train the model of interest. We empirically demonstrate that our proposed method is capable of estimating clean labels without any heuristics in several label noise benchmarks, including synthetic, web-controlled, and real-world label noises. Furthermore, our method performs competitively with many state-of-the-art methods.
Generalized Identifiability Bounds for Mixture Models with Grouped Samples
Vandermeulen, Robert A., Saitenmacher, Renรฉ
Recent work has shown that finite mixture models with $m$ components are identifiable, while making no assumptions on the mixture components, so long as one has access to groups of samples of size $2m-1$ which are known to come from the same mixture component. In this work we generalize that result and show that, if every subset of $k$ mixture components of a mixture model are linearly independent, then that mixture model is identifiable with only $(2m-1)/(k-1)$ samples per group. We further show that this value cannot be improved. We prove an analogous result for a stronger form of identifiability known as "determinedness" along with a corresponding lower bound. This independence assumption almost surely holds if mixture components are chosen randomly from a $k$-dimensional space. We describe some implications of our results for multinomial mixture models and topic modeling.
An Operator Theoretic Approach to Nonparametric Mixture Models
Vandermeulen, Robert A., Scott, Clayton D.
When estimating finite mixture models, it is common to make assumptions on the mixture components, such as parametric assumptions. In this work, we make no distributional assumptions on the mixture components and instead assume that observations from the mixture model are grouped, such that observations in the same group are known to be drawn from the same mixture component. We precisely characterize the number of observations $n$ per group needed for the mixture model to be identifiable, as a function of the number $m$ of mixture components. In addition to our assumption-free analysis, we also study the settings where the mixture components are either linearly independent or jointly irreducible. Furthermore, our analysis considers two kinds of identifiability -- where the mixture model is the simplest one explaining the data, and where it is the only one. As an application of these results, we precisely characterize identifiability of multinomial mixture models. Our analysis relies on an operator-theoretic framework that associates mixture models in the grouped-sample setting with certain infinite-dimensional tensors. Based on this framework, we introduce general spectral algorithms for recovering the mixture components and illustrate their use on a synthetic data set.
Collaborative Filtering and the Missing at Random Assumption
Marlin, Benjamin, Zemel, Richard S., Roweis, Sam, Slaney, Malcolm
Rating prediction is an important application, and a popular research topic in collaborative filtering. However, both the validity of learning algorithms, and the validity of standard testing procedures rest on the assumption that missing ratings are missing at random (MAR). In this paper we present the results of a user study in which we collect a random sample of ratings from current users of an online radio service. An analysis of the rating data collected in the study shows that the sample of random ratings has markedly different properties than ratings of user-selected songs. When asked to report on their own rating behaviour, a large number of users indicate they believe their opinion of a song does affect whether they choose to rate that song, a violation of the MAR condition. Finally, we present experimental results showing that incorporating an explicit model of the missing data mechanism can lead to significant improvements in prediction performance on the random sample of ratings.
Recommender Systems: Missing Data and Statistical Model Estimation
Marlin, Benjamin M. (University of British Columbia) | Zemel, Richard S. (University of Toronto) | Roweis, Sam T. (New York University) | Slaney, Malcolm (Yahoo! Research)
The personalization aspect of recommender systems makes them well suited to applications in The goal of rating-based recommender systems is electronic commerce and entertainment, while the fact that to make personalized predictions and recommendations they do not rely on text-based descriptions of items makes for individual users by leveraging the preferences them well suited to content like movies and music. of a community of users with respect to a In this paper, we focus on a key problem in rating-based collection of items like songs or movies. Recommender collaborative filtering: the possibility of a basic incompatibility systems are often based on intricate statistical between the properties of recommender system data sets models that are estimated from data sets containing and the assumptions required for valid estimation and evaluation a very high proportion of missing ratings. of statistical models in the presence of missing data. This work describes evidence of a basic incompatibility We describe properties of recommender system data sets and between the properties of recommender relate them to the statistical theory of model estimation in system data sets and the assumptions required for the presence of nonrandom missing data. We describe an valid estimation and evaluation of statistical models extended modelling framework and a modified set of evaluation in the presence of missing data. We discuss the protocols for dealing with nonrandom missing data.
Modeling User Rating Profiles For Collaborative Filtering
In this paper we present a generative latent variable model for rating-based collaborative filtering called the User Rating Profile model (URP). The generative process which underlies URP is designed to produce complete user rating profiles, an assignment of one rating to each item for each user. Our model represents each user as a mixture of user attitudes, and the mixing proportions are distributed according to a Dirichlet random variable. The rating for each item is generated by selecting a user attitude for the item, and then selecting a rating according to the preference pattern associated with that attitude. URP is related to several models including a multinomial mixture model, the aspect model [7], and LDA [1], but has clear advantages over each.
Modeling User Rating Profiles For Collaborative Filtering
In this paper we present a generative latent variable model for rating-based collaborative filtering called the User Rating Profile model (URP). The generative process which underlies URP is designed to produce complete user rating profiles, an assignment of one rating to each item for each user. Our model represents each user as a mixture of user attitudes, and the mixing proportions are distributed according to a Dirichlet random variable. The rating for each item is generated by selecting a user attitude for the item, and then selecting a rating according to the preference pattern associated with that attitude. URP is related to several models including a multinomial mixture model, the aspect model [7], and LDA [1], but has clear advantages over each.
Modeling User Rating Profiles For Collaborative Filtering
In this paper we present a generative latent variable model for rating-based collaborative filtering called the User Rating Profile model (URP). The generative process which underlies URP is designed toproduce complete user rating profiles, an assignment of one rating to each item for each user. Our model represents each user as a mixture of user attitudes, and the mixing proportions are distributed according to a Dirichlet random variable. The rating for each item is generated by selecting a user attitude for the item, and then selecting a rating according to the preference pattern associated withthat attitude. URP is related to several models including a multinomial mixture model, the aspect model [7], and LDA [1], but has clear advantages over each.